.Dd December 5, 2023
.Dt LLAMAFILE-PERPLEXITY 1
.Os Llamafile Manual
.Sh NAME
.Nm llamafile-perplexity
.Nd LLM benchmarking tool
.Sh SYNOPSIS
.Nm
.Op flags...
.Sh DESCRIPTION
Perplexity is one of the most common metrics for evaluating language
models. The
.Nm
program can be used to gauge the quality of an LLM implementation. It is
defined as the exponentiated average negative log-likelihood of a
sequence, calculated with exponent base e. Lower perplexity scores are
better.
.Sh OPTIONS
The following options are available:
.Bl -tag -width indent
.It Fl h , Fl Fl help
Show help message and exit.
.It Fl m Ar FNAME , Fl Fl model Ar FNAME
Model path (default: models/7B/ggml-model-f16.gguf)
.It Fl f Ar FNAME , Fl Fl file Ar FNAME
Raw data input file.
.It Fl t Ar N , Fl Fl threads Ar N
Number of threads to use during generation (default: nproc/2)
.It Fl s Ar SEED , Fl Fl seed Ar SEED
Random Number Generator (RNG) seed (default: -1, use random seed for < 0)
.Sh EXAMPLE
One dataset commonly used in the llama.cpp community for measuring
perplexity is wikitext-2-raw. To use it when testing how well both your
model and llamafile are performing you could run the following:
.Bd -literal
wget https://cosmo.zip/pub/datasets/wikitext-2-raw/wiki.test.raw
llamafile-perplexity -m model.gguf -f wiki.test.raw -s 31337
.Pp
This can sometimes lead to surprising conclusions, like how Q5 weights
might be better for a particular model than Q6.
.Ed
.Sh SEE ALSO
.Xr llamafile 1
